Model Selection

Chinese Image Understanding

# Chinese Image Understanding

Qwen2.5 VL 7B Captioner Relaxed GGUF

Qwen2.5-VL-7B-Captioner-Relaxed is a multimodal vision-language model based on the Qwen2.5 architecture, focusing on image-to-text generation tasks.

Image-to-Text English

A multimodal model fine-tuned based on InternVL-Chat-V1-5, excelling in MMBench benchmark tests

A 1.6B-parameter multimodal model combining SigLIP and Phi-1.5 architectures, supporting image understanding and Q&A tasks

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase